A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

نویسندگان

چکیده

Temporal Sentence Grounding in Videos (TSGV) , which aims to ground a natural language sentence that indicates complex human activities an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found current benchmark datasets may obvious moment annotation biases, enabling several simple baselines even without training achieve state-of-the-art (SOTA) performance. In this paper, we take closer look at existing evaluation protocols for TSGV, and find both prevailing dataset splits metrics are devils lead untrustworthy benchmarking. Therefore, propose re-organize two widely-used datasets, making ground-truth distributions different test splits, i.e., out-of-distribution (OOD) test. Meanwhile, introduce new metric “dR@ n ,IoU= m ” discounts basic recall scores especially with small IoU thresholds, so as alleviate inflating caused by biased large proportion of long moments. New benchmarking results indicate our proposed can better monitor research progress TSGV. Furthermore, novel causality-based Multi-branch Deconfounding Debiasing (MDD) framework unbiased prediction. Specifically, design multi-branch deconfounder eliminate effects multiple confounders causal intervention. order help model align semantics between queries video moments, enhance representations during feature encoding. textual information, query is parsed into verb-centered phrases obtain more fine-grained feature. For visual positional information been decomposed from features moments diverse locations. Extensive experiments demonstrate approach competitive among SOTA approaches outperform base great gains.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quality metric design: a closer look

The design of reliable visual quality metrics is complicated by our limited knowledge of the human visual system and the resulting variety of pertinent vision models. We have begun to analyze and compare a number of implementation choices for some components found in most of today’s visual quality metrics that are based on a model of human vision and present the first results here.

متن کامل

A closer look at rock physics models and their assisted interpretation in seismic exploration

Subsurface rocks and their fluid content along with their architecture affect reflected seismic waves through variations in their travel time, reflection amplitude, and phase within the field of exploration seismology. The combined effects of these factors make subsurface interpretation by using reflection waves very difficult. Therefore, assistance from other subsurface disciplines is needed i...

متن کامل

A Closer Look at Preduction

In a recent paper Jun Arima formalises the idea of preduc-tion { a non-monotonic schema providing a common basis for inductive and analogical reasoning. We examine some of the implications of this schema, as well as its connections with prototypes and conceptual spaces. A generalised version of his schema is motivated and developed to address diiculties with asymmetry and disjuncts.

متن کامل

A Closer Look at HMAC

Bellare, Canetti and Krawczyk [BCK96] show that cascading an ε-secure (fixed input length) PRF gives an O(εnq)-secure (variable input length) PRF when making at most q prefix-free queries of length n blocks. We observe that this translates to the same bound for NMAC (which is the cascade without the prefix-free requirement but an additional application of the PRF at the end), and give a matchin...

متن کامل

A Closer Look at Trumping1

According to Schaffer (2000a), “trumping preemption” is a category of redundant causation distinct from early and late preemption and from overdetermination. I show that the putative causal difference between causal processes in cases thought to be trumping preemption generates early preemption or overdetermination rather than trumping. I draw a novel lesson from cases thought to be trumping: t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Multimedia Computing, Communications, and Applications

سال: 2023

ISSN: ['1551-6857', '1551-6865']

DOI: https://doi.org/10.1145/3565573